Implement Typed Documents and TypeRegistry #282

jterapin · 2025-03-06T21:47:57Z

Description: Implementation for Typed Documents and TypeRegistry. Currently only supports JSON documents.

It is highly likely that we have to revisit this implementation in the future since typed document/type registry still being evolved.

gems/smithy-schema/lib/smithy-schema/document.rb

gems/smithy-schema/lib/smithy-schema/type_registry.rb

gems/smithy-schema/spec/smithy-schema/document_spec.rb

gems/smithy-schema/lib/smithy-schema/document.rb

gems/smithy-schema/lib/smithy-schema/document_utils.rb

projections/shapes/lib/shapes/schema.rb

mullermp

Nice. Looking better. Still have a bunch of comments though .. In general I think we can still simplify and also be less aggressive on validation and instead be more permissive where possible.

mullermp · 2025-05-09T01:28:51Z

gems/smithy-schema/lib/smithy-schema/document.rb

+    #   shape = Smithy::Schema::StructureShape.new
+    #   data = Document::Data.new({ "name" => "example" }, shape: shape)
+    #
+    module Document


I was expecting document to be a class and have it be the delegator. What was the intention of making another data subclass?

From what I recall - I believe I wanted to the Document (De)serializers to be nested under the Document namespace.

Reverted back to Document class approach with some class methods.

gems/smithy-schema/lib/smithy-schema/document.rb

mullermp · 2025-05-09T01:36:53Z

gems/smithy-schema/lib/smithy-schema/type_registry.rb

+      # @param  [Hash<String, Shapes::StructureShape>] registry
+      def initialize(registry = {})
+        @registry = registry
+        @shapes_by_type = register_shape_types(registry.values)


Is there a way to populate this from the code generated side? If we must iterate shapes, we may as well backtrack and populate both maps in one pass. That at least reduces generated code. Otherwise is shapes_by_type even necessary?

Yeah, we can populate this from code generated side or build at runtime - depends on what we want. I'm not sure which will be preferred when given context is like EC2 for example.

From what I gathered, we are expected customers to use the type registry so I felt that hiding this detail is better. I guess at that point, you could say we might as well just accept an array of shapes so we can build the registry and shapes_by_type (previous approach before this iteration)

Any preference?

The shapes_by_type is necessary to find shape schema based on runtime shape class. Since runtime shape does not know its own shape schema.

Ok - for now, I decided to just accept an array of shapes for now (just thinking about EC2 schema file, oof) We can do a switchroo later anyways - it's not firm.

mullermp · 2025-05-09T01:37:19Z

gems/smithy-schema/lib/smithy-schema/type_registry.rb

+
+      # @api private
+      # @return [Hash<String, Shapes::StructureShape>]
+      attr_accessor :registry


Both of these accessors shouldn't exist. Our public methods should hide this detail.

Yup, that makes sense. Hope the changes I made makes sense.

mullermp · 2025-05-09T02:37:52Z

gems/smithy-schema/lib/smithy-schema/document/serializer.rb

+        end
+
+        def typed_document?(values)
+          (values.is_a?(Smithy::Schema::Structure) && @type_registry.shape_by_type(values.class)) ||


Isn't this already checked? And wouldn't this always be true if it was a structure, because it would already be registered?

Yeah, I'll refactor - now that unions are not included in type registry.

mullermp · 2025-05-09T02:39:34Z

gems/smithy-schema/lib/smithy-schema/document/serializer.rb

+          ref.shape.member(name) || find_member_ref_by_names(ref, name)
+        end
+
+        def find_member_ref_by_names(ref, name)


This seems inefficient. Similar to what we do in codecs, for structure and union, you will want to iterate the shape members and not the values, then you can check json name that way. You're doing a loop for every member, so it's n^2 performance.

Yeah - I streamlined closely to how a general codec serializer works. The updated approach does that but - we still need to do some checks since the given value key - could be a jsonName....symbolized key or memberName reflected on the modeled shape.

mullermp · 2025-05-09T02:39:59Z

gems/smithy-schema/lib/smithy-schema/document/serializer.rb

+          end
+        end
+
+        def resolve_member_name(member_ref, opts)


Check out the location_name approach in my PR - we should use similar terms. You can easily handle this with || optionality.

Yup, brought that over.

gems/smithy-schema/lib/smithy-schema/document/deserializer.rb

mullermp · 2025-05-09T02:49:47Z

gems/smithy-schema/spec/smithy-schema/document/serializer_spec.rb

+      describe Serializer do
+        let(:shapes) do
+          shapes = SchemaHelper.sample_shapes
+          shapes['smithy.ruby.tests#Structure']['members']['timestampDateTime'] = {


I would prefer if you move these definitions closer to the test (in the actual tests where they are needed) - it's easier to manage tests that way if they are discrete.

That's fair - I'll move it.

alextwoods

Nice - its generally looking good.
I think the functionality from the Document::Data class could be moved into Document as a class (unless theres some reason I'm missing). I also understand why the Document serializer and deserializer exist separately and require a type registery - but I think I would lean towards the public interface for serializing/deserializing documents living on the top level class - it could still require a type registry to be provided and could use these classes under the hood to implement it (and they could then be api private).

mullermp · 2025-05-09T22:52:55Z

That's effectively what I was also saying but I agree.

jterapin

I tried to look over every comment that I wrote (since I like to write it as I go to remember where I left off) so if something doesn't make sense, let me know! Thanks!

jterapin · 2025-05-14T15:51:17Z

gems/smithy-schema/lib/smithy-schema/document.rb

+    #   shape = Smithy::Schema::StructureShape.new
+    #   data = Document::Data.new({ "name" => "example" }, shape: shape)
+    #
+    module Document


From what I recall - I believe I wanted to the Document (De)serializers to be nested under the Document namespace.

jterapin · 2025-05-14T17:05:09Z

gems/smithy-schema/lib/smithy-schema/document/serializer.rb

+  module Schema
+    module Document
+      # Serializes data into a document data.
+      class Serializer


Yeah... Ideally, I would like that but for typed documents to work properly (deserializing nested typed documents) - it needs to reference a specific type_registry to (de)serialize properly.

This is prob why Dowling wanted TypeRegistry to handle these responsibilities.

jterapin · 2025-05-14T17:06:36Z

gems/smithy-schema/lib/smithy-schema/document/serializer.rb

+        #   serializer.create_document("some document")
+        #   # => {"foo" => "bar"}


Yeah this is wrong, I'll fix.

jterapin · 2025-05-14T17:08:12Z

gems/smithy-schema/lib/smithy-schema/document/serializer.rb

+        #   # => an instance of Smithy::Schema::Document::Data
+        #   document.discriminator
+        #   # => "smithy.ruby.tests#Structure"
+        def create_document(data)


I could create a class method under Document that references this file to handle (under the hood).

jterapin · 2025-05-14T17:08:34Z

gems/smithy-schema/lib/smithy-schema/document/serializer.rb

+        # @option opts [Boolean] :use_json_name Whether to use `jsonName` trait
+        #   or just member name. The `jsonName` trait is ignored by default.
+        # @return [Hash] Serialized document data
+        def serialize_document(document, opts = {})


Yeah I can make class methods at the Document level

jterapin · 2025-05-21T03:53:36Z

gems/smithy-schema/lib/smithy-schema/document/serializer.rb

+          ref.shape.member(name) || find_member_ref_by_names(ref, name)
+        end
+
+        def find_member_ref_by_names(ref, name)


Yeah - I streamlined closely to how a general codec serializer works. The updated approach does that but - we still need to do some checks since the given value key - could be a jsonName....symbolized key or memberName reflected on the modeled shape.

jterapin · 2025-05-21T03:54:05Z

gems/smithy-schema/lib/smithy-schema/document/serializer.rb

+          end
+        end
+
+        def resolve_member_name(member_ref, opts)


Yup, brought that over.

gems/smithy-schema/lib/smithy-schema/document/deserializer.rb

jterapin · 2025-05-21T03:58:30Z

gems/smithy/spec/support/examples/schema_examples.rb

+    # TODO: failing due to synthetic shape changes
+    # it 'contains a registry of typed shapes' do
+    #   expect(subject.keys).to match_array(typed_shapes.keys)
+    # end


Commented out failing test due to synthetic input/output shape id changes

jterapin · 2025-05-21T03:59:36Z

projections/weather/lib/weather/schema.rb

+
+    class << self
+      def type_registry
+        Smithy::Schema::TypeRegistry.new([CityCoordinates, CitySummary, GetCityInput, GetCityOutput, GetCurrentTimeOutput, GetForecastInput, GetForecastOutput, ListCitiesInput, ListCitiesOutput, NoSuchResource])


Could go back to doing map, etc but this will increase file size for services like ec2

jterapin added 3 commits March 6, 2025 12:12

Add type registry prototype class

d33fcdf

Add type registry to codegenerated schema

5ff40cc

Update projections

66a6285

jterapin changed the title ~~Implement Typed Documents and TypeRegistry~~ [WIP] Implement Typed Documents and TypeRegistry Mar 6, 2025

jterapin added 24 commits March 7, 2025 13:39

Merge branch 'decaf' into typed_documents

c5e45ed

Merge branch 'decaf' into typed_documents

2bd15a2

Merge branch 'decaf' into typed_documents

03bfa82

Update requires

e6435d5

Add initial document implementation

0830827

Merge decaf into branch

877654f

Update to include cbor

a61318f

Expand on typed docs

4edfae3

Update file names

ff959f1

Merge branch 'decaf' into typed_documents

8b9b560

More refactoring

598db66

Merge branch 'decaf' into typed_documents

3a4c0d1

Remove scratches

269b2b5

Fix rubocop

90c58ce

Clean up document

a1e46cc

Clean document specs

8b666cd

Update TypeRegistry

2ddf4bd

Add documentation

112ddf4

Add TypeRegistry specs

6283813

Merge branch 'decaf' into typed_documents

efbfa5e

Add TypeRegistry tests

88ff845

Update projections

66b2cde

Update syntax

22998a0

Update projections

9afeacd

jterapin changed the title ~~[WIP] Implement Typed Documents and TypeRegistry~~ Implement Typed Documents and TypeRegistry Apr 15, 2025

jterapin commented Apr 15, 2025

View reviewed changes

mullermp reviewed May 9, 2025

View reviewed changes

alextwoods reviewed May 9, 2025

View reviewed changes

jterapin added 9 commits May 14, 2025 08:52

Update Document Data class discriminator

3ae063f

Remove data method out of Document::Data

fcbe801

Update unions for document deserializers

03a2d67

Fix serializer example

252456d

Update shape test definitions closer to test cases

1c59f21

Update schema heler requires

d0ef75e

Update handling of typed documents

669d90a

Remove unnecessary docs

c1ea6c9

Clean up document serializer

fe8271c

jterapin mentioned this pull request May 20, 2025

Defaults #297

Merged

jterapin added 6 commits May 20, 2025 09:17

merge from decaf

6b8fa2e

Comment out failing test due to synethic shape changes

dea7d14

Streamline documents

82a54f7

Type Registry fixes

cd380ed

Update projections

8bc6e17

Update docs

312d94e

jterapin commented May 21, 2025

View reviewed changes

jterapin requested review from mullermp and alextwoods May 21, 2025 04:00

mullermp added 5 commits May 22, 2025 10:34

Merge branch 'decaf' into typed_documents

612807a

Dynamic type registry shape population

1904852

Clean up of type registry implementation and spec

d404cbe

Clean up some docs and implementation

c5ba058

More renaming + doc fixes

cc9fdd2

mullermp approved these changes May 22, 2025

View reviewed changes

mullermp merged commit 97b7f61 into decaf May 22, 2025
17 checks passed

mullermp deleted the typed_documents branch May 22, 2025 18:09

		# serializer.create_document("some document")
		# # => {"foo" => "bar"}

Implement Typed Documents and TypeRegistry #282

Implement Typed Documents and TypeRegistry #282

Uh oh!

Conversation

jterapin commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mullermp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alextwoods left a comment

Choose a reason for hiding this comment

Uh oh!

mullermp commented May 9, 2025

Uh oh!

jterapin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

jterapin commented Mar 6, 2025 •

edited

Loading